Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding logic to master_is_stable indicator to check for discovery problems #88020

Merged

Conversation

masseyke
Copy link
Member

@masseyke masseyke commented Jun 24, 2022

This PR builds on #86524, #87482, and #87306 by supporting the case where there has been no master node in the last 30 second, no node has been elected master, and the current node is master eligible. This is branch 1.2.2.4 in the diagram at #87482 (comment).
The outline of the logic is that when we see that the master node has gone null, we start polling other master-eligible nodes for their ClusterFormationState. Once a diagnoseMasterStability() request comes in we look at the ClusterFormationStates from all of the mater nodes and the result is one of the following:

  1. We have received an exception from one of the other master-eligible nodes (either on that node or a timeout), and return RED (1.2.2.4.1)
  2. We realize that some nodes report that they have not discovered all of the other master-eligible nodes, and return RED (1.2.2.4.2)
  3. We realize that some nodes report that there is no quorum, and return RED (1.2.2.4.3.1)
  4. We realize that every node thinks there is a quorum, so some other problem is occurring, and we return RED (1.2.2.4.3.2)

Note that in this PR we are not returning all of the details described in the diagram (such as which nodes cannot discover which other nodes). Instead we're only giving the details from the local ClusterFormationState. Once we figure out how the details will be used we will add the other information in a later PR.

Here is an example response for case 1 above:

{
    "status": "red",
    "cluster_name": "TEST-TEST_WORKER_VM=[--not-gradle--]-CLUSTER_SEED=[-8835703766361244273]-HASH=[11CD8E4183A082]-cluster",
    "components": {
        "cluster_coordination": {
            "status": "red",
            "indicators": {
                "master_is_stable": {
                    "status": "red",
                    "summary": "No master node observed in the last 1s, and an exception occurred while reaching out to node_t1 for diagnosis",
                    "help_url": "https://ela.st/fix-master",
                    "details": {
                        "current_master": {
                            "node_id": null,
                            "name": null
                        },
                        "recent_masters": [
                            {
                                "node_id": "EuJQ4HDWQRSSTPFRd9MkGw",
                                "name": "node_t0"
                            }
                        ],
                        "exception_fetching_history": {
                            "message": "Artificial failure",
                            "stack_trace": "java.lang.RuntimeException: Artificial failure\n\tat org.elasticsearch.cluster.coordination.CoordinationDiagnosticsService$ClusterFormationStateOrException.<init>(CoordinationDiagnosticsService.java:640)\n\tat org.elasticsearch.cluster.coordination.CoordinationDiagnosticsService$1$1.onResponse(CoordinationDiagnosticsService.java:604)\n\tat org.elasticsearch.cluster.coordination.CoordinationDiagnosticsService$1$1.onResponse(CoordinationDiagnosticsService.java:592)\n\tat org.elasticsearch.action.ActionListener$RunBeforeActionListener.onResponse(ActionListener.java:415)\n\tat org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:43)\n\tat org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1337)\n\tat org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1422)\n\tat org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1402)\n\tat org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:41)\n\tat org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:39)\n\tat org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:20)\n\tat org.elasticsearch.action.admin.cluster.coordination.ClusterFormationInfoAction$TransportAction.doExecute(ClusterFormationInfoAction.java:137)\n\tat org.elasticsearch.action.admin.cluster.coordination.ClusterFormationInfoAction$TransportAction.doExecute(ClusterFormationInfoAction.java:120)\n\tat org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:79)\n\tat org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:54)\n\tat org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:71)\n\tat org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:67)\n\tat org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:67)\n\tat org.elasticsearch.transport.TransportService.sendLocalRequest(TransportService.java:908)\n\tat org.elasticsearch.transport.TransportService$3.sendRequest(TransportService.java:123)\n\tat org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:848)\n\tat org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:737)\n\tat org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:683)\n\tat org.elasticsearch.cluster.coordination.CoordinationDiagnosticsService$1.onResponse(CoordinationDiagnosticsService.java:587)\n\tat org.elasticsearch.cluster.coordination.CoordinationDiagnosticsService$1.onResponse(CoordinationDiagnosticsService.java:581)\n\tat org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:411)\n\tat org.elasticsearch.cluster.coordination.CoordinationDiagnosticsService.lambda$beginPollingClusterFormationInfo$3(CoordinationDiagnosticsService.java:577)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)\n\tat java.base/java.util.concurrent.FutureTask.runAndReset$$$capture(FutureTask.java:305)\n\tat java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java)\n\tat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:833)\n"
                        },
                        "cluster_formation": {
                            "node1": "master not discovered or elected yet, an election requires at least 2 nodes with ids from [EuJQ4HDWQRSSTPFRd9MkGw, b-6204euRty75T3FWMuykA, 6EBYghrnQrWE8yiwTdlGTg], have discovered possible quorum [{node_t1}{6EBYghrnQrWE8yiwTdlGTg}{euqgRAHyRxG0IJWfARa9Hw}{node_t1}{127.0.0.1}{127.0.0.1:13301}{cdfhilmrstw}, {node_t2}{b-6204euRty75T3FWMuykA}{u7NSjyF3RfmT1rCRD-F7Wg}{node_t2}{127.0.0.1}{127.0.0.1:13303}{cdfhilmrstw}, {node_t0}{EuJQ4HDWQRSSTPFRd9MkGw}{nvCD3YwyRDu3tFN1_7JLlA}{node_t0}{127.0.0.1}{127.0.0.1:13302}{cdfhilmrstw}]; discovery will continue using [127.0.0.1:13302, 127.0.0.1:13303] from hosts providers and [{node_t2}{b-6204euRty75T3FWMuykA}{u7NSjyF3RfmT1rCRD-F7Wg}{node_t2}{127.0.0.1}{127.0.0.1:13303}{cdfhilmrstw}, {node_t0}{EuJQ4HDWQRSSTPFRd9MkGw}{nvCD3YwyRDu3tFN1_7JLlA}{node_t0}{127.0.0.1}{127.0.0.1:13302}{cdfhilmrstw}, {node_t1}{6EBYghrnQrWE8yiwTdlGTg}{euqgRAHyRxG0IJWfARa9Hw}{node_t1}{127.0.0.1}{127.0.0.1:13301}{cdfhilmrstw}] from last-known cluster state; node term 1, last-accepted version 3 in term 1",
                            "node2": "master not discovered or elected yet...",
                            "node3": "master not discovered or elected yet..."
                        }
                    },
                    "impacts": [...],
                    "user_actions": [...]
                }
            }
        },
        "data": {...},
        "snapshot": {...}
    }
}

And an example from case 2:

{
    "status": "red",
    "cluster_name": "TEST-TEST_WORKER_VM=[--not-gradle--]-CLUSTER_SEED=[-6341988252367058471]-HASH=[11CC4EF6482AF0]-cluster",
    "components": {
        "cluster_coordination": {
            "status": "red",
            "indicators": {
                "master_is_stable": {
                    "status": "red",
                    "summary": "No master node observed in the last 30s, and some master eligible nodes are unable to discover other master eligible nodes",
                    "help_url": "https://ela.st/fix-master",
                    "details": {
                        "current_master": {
                            "node_id": null,
                            "name": null
                        },
                        "recent_masters": [
                            {
                                "node_id": "LBrijWVtTiWZm03kHDE1iw",
                                "name": "node_t2"
                            }
                        ],
                        "cluster_formation": {
                            "node1": "master not discovered or elected yet, an election requires at least 2 nodes with ids from [R25HgyJkTVmfr2j7Lv3F_Q, LBrijWVtTiWZm03kHDE1iw, okA4QNE9R0aTvTCEKPinuA], have discovered possible quorum [{node_t1}{R25HgyJkTVmfr2j7Lv3F_Q}{wm2Tz70qS4yDpR1N8otm_A}{node_t1}{127.0.0.1}{127.0.0.1:13302}{cdfhilmrstw}, {node_t0}{okA4QNE9R0aTvTCEKPinuA}{3nNebXXZQ5CCWd61EM4iww}{node_t0}{127.0.0.1}{127.0.0.1:13301}{cdfhilmrstw}, {node_t2}{LBrijWVtTiWZm03kHDE1iw}{Y-h0Er4NSVuS0XPIskjsHA}{node_t2}{127.0.0.1}{127.0.0.1:13303}{cdfhilmrstw}]; discovery will continue using [127.0.0.1:13301, 127.0.0.1:13303] from hosts providers and [{node_t0}{okA4QNE9R0aTvTCEKPinuA}{3nNebXXZQ5CCWd61EM4iww}{node_t0}{127.0.0.1}{127.0.0.1:13301}{cdfhilmrstw}, {node_t2}{LBrijWVtTiWZm03kHDE1iw}{Y-h0Er4NSVuS0XPIskjsHA}{node_t2}{127.0.0.1}{127.0.0.1:13303}{cdfhilmrstw}, {node_t1}{R25HgyJkTVmfr2j7Lv3F_Q}{wm2Tz70qS4yDpR1N8otm_A}{node_t1}{127.0.0.1}{127.0.0.1:13302}{cdfhilmrstw}] from last-known cluster state; node term 1, last-accepted version 4 in term 1; joining [{node_t2}{LBrijWVtTiWZm03kHDE1iw}{Y-h0Er4NSVuS0XPIskjsHA}{node_t2}{127.0.0.1}{127.0.0.1:13303}{cdfhilmrstw}] in term [1] has status [waiting for response] after [1ms]",
                            "node3": "master not discovered or elected yet...",
                            "node2": "master not discovered or elected yet..."
                        }
                    },
                    "impacts": [...],
                    "user_actions": [...]
                }
            }
        },
        "data": {...},
        "snapshot": {...}
    }
}

And case 3:

{
    "status": "red",
    "cluster_name": "TEST-TEST_WORKER_VM=[--not-gradle--]-CLUSTER_SEED=[1888251737628875036]-HASH=[117E5600A0F2DB]-cluster",
    "components": {
        "cluster_coordination": {
            "status": "red",
            "indicators": {
                "master_is_stable": {
                    "status": "red",
                    "summary": "No master node observed in the last 30s, and the master eligible nodes are unable to form a quorum",
                    "help_url": "https://ela.st/fix-master",
                    "details": {
                        "current_master": {
                            "node_id": null,
                            "name": null
                        },
                        "recent_masters": [
                            {
                                "node_id": "9hPCQwPsRrCX9g4DjYywzw",
                                "name": "node_t2"
                            }
                        ],
                        "cluster_formation": {
                            "node2": "master not discovered or elected yet, an election requires a node with id [dcA01mVWTiebGysnbNhwiA], have only discovered non-quorum [{node_t0}{ryn0dLlATV2b8WCyIlIN6A}{pVJwZeLPRqKvqSCO53KQ2Q}{node_t0}{127.0.0.1}{127.0.0.1:13303}{m}]; discovery will continue using [127.0.0.1:13301, 127.0.0.1:13302] from hosts providers and [{node_t0}{ryn0dLlATV2b8WCyIlIN6A}{pVJwZeLPRqKvqSCO53KQ2Q}{node_t0}{127.0.0.1}{127.0.0.1:13303}{m}, {node_t1}{dcA01mVWTiebGysnbNhwiA}{OeAmKbeZQD-kOmKimCDY8Q}{node_t1}{127.0.0.1}{127.0.0.1:13301}{m}, {node_t2}{9hPCQwPsRrCX9g4DjYywzw}{kZWBj1g7QZCNWRSGu9EiVg}{node_t2}{127.0.0.1}{127.0.0.1:13302}{m}] from last-known cluster state; node term 2, last-accepted version 7 in term 1; joining [{node_t1}{dcA01mVWTiebGysnbNhwiA}{OeAmKbeZQD-kOmKimCDY8Q}{node_t1}{127.0.0.1}{127.0.0.1:13301}{m}] in term [2] has status [waiting for response] after [30s/30039ms]",
                            "node1": "master not discovered or elected yet...",
                            "node3": "master not discovered or elected yet..."
                        }
                    },
                    "impacts": [...]
                    "user_actions": [...]
                }
            }
        },
        "data": {...},
        "snapshot": {...}
    }
}

And case 4:

{
    "status": "red",
    "cluster_name": "TEST-TEST_WORKER_VM=[--not-gradle--]-CLUSTER_SEED=[-6341988252367058471]-HASH=[11CC4EF6482AF0]-cluster",
    "components": {
        "cluster_coordination": {
            "status": "red",
            "indicators": {
                "master_is_stable": {
                    "status": "red",
                    "summary": "No master node observed in the last 30s, and the cause has not been determined.",
                    "help_url": "https://ela.st/fix-master",
                    "details": {
                        "current_master": {
                            "node_id": null,
                            "name": null
                        },
                        "recent_masters": [
                            {
                                "node_id": "LBrijWVtTiWZm03kHDE1iw",
                                "name": "node_t2"
                            }
                        ],
                        "cluster_formation": {
                            "node1": "master not discovered or elected yet, an election requires at least 2 nodes with ids from [R25HgyJkTVmfr2j7Lv3F_Q, LBrijWVtTiWZm03kHDE1iw, okA4QNE9R0aTvTCEKPinuA], have discovered possible quorum [{node_t1}{R25HgyJkTVmfr2j7Lv3F_Q}{wm2Tz70qS4yDpR1N8otm_A}{node_t1}{127.0.0.1}{127.0.0.1:13302}{cdfhilmrstw}, {node_t0}{okA4QNE9R0aTvTCEKPinuA}{3nNebXXZQ5CCWd61EM4iww}{node_t0}{127.0.0.1}{127.0.0.1:13301}{cdfhilmrstw}, {node_t2}{LBrijWVtTiWZm03kHDE1iw}{Y-h0Er4NSVuS0XPIskjsHA}{node_t2}{127.0.0.1}{127.0.0.1:13303}{cdfhilmrstw}]; discovery will continue using [127.0.0.1:13301, 127.0.0.1:13303] from hosts providers and [{node_t0}{okA4QNE9R0aTvTCEKPinuA}{3nNebXXZQ5CCWd61EM4iww}{node_t0}{127.0.0.1}{127.0.0.1:13301}{cdfhilmrstw}, {node_t2}{LBrijWVtTiWZm03kHDE1iw}{Y-h0Er4NSVuS0XPIskjsHA}{node_t2}{127.0.0.1}{127.0.0.1:13303}{cdfhilmrstw}, {node_t1}{R25HgyJkTVmfr2j7Lv3F_Q}{wm2Tz70qS4yDpR1N8otm_A}{node_t1}{127.0.0.1}{127.0.0.1:13302}{cdfhilmrstw}] from last-known cluster state; node term 1, last-accepted version 4 in term 1",
                            "node2": "master not discovered or elected yet...",
                            "node3": "master not discovered or elected yet..."
                        }
                    },
                    "impacts": [...],
                    "user_actions": [...]
                }
            }
        },
        "data": {...},
        "snapshot": {...}
    }
}

@masseyke masseyke marked this pull request as ready for review June 30, 2022 21:19
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Jun 30, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine
Copy link
Collaborator

Hi @masseyke, I've created a changelog YAML for you.

@masseyke masseyke requested a review from andreidan June 30, 2022 21:20
Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Keith.

I have some questions about the approach we took here.

Also, would you mind trimming the description to contain only the relevant parts ? (ie. master_is_stable indicator, without impacts and the likes - it seems that only summary and details are affected)

Comment on lines 313 to 322
} else if (clusterService.localNode().isMasterNode() == false) { // none is elected master and we aren't master eligible
// NOTE: The logic in this block will be implemented in a future PR
result = new CoordinationDiagnosticsResult(
CoordinationDiagnosticsStatus.RED,
"No master has been observed recently",
CoordinationDiagnosticsDetails.EMPTY
);
} else { // none is elected master and we are master eligible
result = diagnoseOnHaveNotSeenMasterRecentlyAndWeAreMasterEligible(localMasterHistory, masterEligibleNodes, explain);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this if/else block is becoming hard to follow and reason about
ie. where do we check we aren't master eligibile node? the last else statement has a bunch of implicit decisions that are hard to verify (ie. why are we sure we're master eligible in this case?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else if right above the else checks if we're master eligible. I'll try to make it more explicit.

* @param nodeToClusterFormationStateMap A map of each master node to its ClusterFormationState
* @return true if there are discovery problems, false otherwise
*/
private boolean hasDiscoveryProblems(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit ambiguous w.r.t. what it is diagnosing - who has discovery problems?

Maybe we can be more intentional in the method name and also return (or log?) the problems we discover?
ie. who cannot discover which node?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally calculating and returning this, but removed that since we're not going to be putting it in the details in the response. I can change it to log that information for now.

* @param nodeToClusterFormationStateMap A map of each master node to its ClusterFormationState
* @return True if any nodes in nodeToClusterFormationStateMap report a problem forming a quorum, false otherwise.
*/
private boolean hasQuorumProblems(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - would it be useful to log the problems we discover? Or return them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah at some point we'll be putting them into the details section of the response. For now I'll log them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm - this means they don't currently add any new information compared to what we provide in the generic cluster_formation.description field. Is that correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. That was the conclusion to the discussion on the document I shared about this:

W.r.t. structure I don’t think we have a clear indication as to how the details section of the `master_is_stable` indicator is going to be used for now, so I’d
suggest we keep the `ClusterFormationState#description` as the only field in the `master_is_stable` details field for now and add structure at 
a later phase. 

Once the health API is used in the diagnostics bundle (very soon) we’ll be able to get some more engineers exposed to the `details` field and get some 
feedback about the needs and shortcomings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I was just curious if we've gained extra information in these diagnostic steps

Thanks for the confirmation

@andreidan
Copy link
Contributor

Would we benefit from attaching #87482 (comment) to the meta issue ?

@masseyke
Copy link
Member Author

masseyke commented Jul 5, 2022

Would we benefit from attaching #87482 (comment) to the meta issue ?

Added at #85624 (comment)

masseyke added 2 commits July 5, 2022 14:50
…b.com:masseyke/elasticsearch into feature/health-api-master-stability-discovery
@masseyke masseyke requested a review from andreidan July 5, 2022 21:34
@masseyke
Copy link
Member Author

@elasticmachine update branch

@masseyke masseyke requested a review from andreidan July 14, 2022 19:15
@elasticsearchmachine elasticsearchmachine changed the base branch from master to main July 22, 2022 23:06
Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this Keith

Left one suggestion.

+ "eligible nodes",
nodeHasMasterLookupTimeframe
),
getDetails(explain, localMasterHistory, null, coordinator.getClusterFormationState().getDescription())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen @DaveCTurner use the cluster formation details from all involved nodes.

I think, since we got hang of all the master nodes view on the cluster formation, we should report each node's view under the details section.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we definitely need to do that. It sounds like we had a miscommunication earlier -- I thought you wanted that information removed so I removed it. I'll find a way to put it back in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK if there is a discovery or quorum problem, I am now putting all of the cluster formation descriptions for all master nodes into the details for debugging purposes.

assertThat(result.summary(), containsString(" some master eligible nodes are unable to discover other master eligible nodes"));
}

public void testAnyNodeInClusterReportsDiscoveryProblems() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@masseyke
Copy link
Member Author

@elasticmachine run elasticsearch-ci/packaging-tests-windows-sample

@masseyke
Copy link
Member Author

@elasticmachine run elasticsearch-ci/bwc

@masseyke masseyke requested a review from andreidan July 26, 2022 13:29
Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this Keith

Just a couple more questions left

@@ -361,7 +511,8 @@ private CoordinationDiagnosticsResult getResultOnNoMasterEligibleNodes(MasterHis
CoordinationDiagnosticsDetails details = getDetails(
explain,
localMasterHistory,
coordinator.getClusterFormationState().getDescription()
null,
Map.of(coordinator.getLocalNode().getId(), coordinator.getClusterFormationState().getDescription())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be beneficial if the local node's view is expressed separately? Or put it the other way, the remote ones be expressed oner a new object in the representation? remote or something better named?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, would we still want to have the cluster formation expressed as objects for future extension? (in case we'll want to add some structure)

ie:

"localNode": { "description" : " issues" },
"remote" : { 
       "node1" : { "description" : "issues" },
       "node2": { "description" : "other things"}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't really think of any reason, so I put them together to simplify the response. Can you think of a reason it would be useful to have them separate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if in large clusters it'll become a bit difficult to determine the local view amongst a list of remote views? Maybe I'm over-optimising.
Could you please update the PR description with the latest responses and let's move ahead with the simple solution for now (we could iterate afterwards to improve it - once we see it in live cases)

Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for implementing this Keith

@masseyke
Copy link
Member Author

@elasticmachine run elasticsearch-ci/part-1

@mark-vieira mark-vieira added v8.5.0 and removed v8.4.0 labels Jul 27, 2022
@masseyke masseyke merged commit 41d7280 into elastic:main Jul 27, 2022
@masseyke masseyke deleted the feature/health-api-master-stability-discovery branch July 27, 2022 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants